Artificial intelligence in lung cancer screening: Detection, classification, prediction, and prognosis

Abstract Background The exceptional capabilities of artificial intelligence (AI) in extracting image information and processing complex models have led to its recognition across various medical fields. With the continuous evolution of AI technologies based on deep learning, particularly the advent of convolutional neural networks (CNNs), AI presents an expanded horizon of applications in lung cancer screening, including lung segmentation, nodule detection, false‐positive reduction, nodule classification, and prognosis. Methodology This review initially analyzes the current status of AI technologies. It then explores the applications of AI in lung cancer screening, including lung segmentation, nodule detection, and classification, and assesses the potential of AI in enhancing the sensitivity of nodule detection and reducing false‐positive rates. Finally, it addresses the challenges and future directions of AI in lung cancer screening. Results AI holds substantial prospects in lung cancer screening. It demonstrates significant potential in improving nodule detection sensitivity, reducing false‐positive rates, and classifying nodules, while also showing value in predicting nodule growth and pathological/genetic typing. Conclusions AI offers a promising supportive approach to lung cancer screening, presenting considerable potential in enhancing nodule detection sensitivity, reducing false‐positive rates, and classifying nodules. However, the universality and interpretability of AI results need further enhancement. Future research should focus on the large‐scale validation of new deep learning‐based algorithms and multi‐center studies to improve the efficacy of AI in lung cancer screening.


GoogleNet:
In 2014, a team at Google introduced the GoogleNet architecture.It employs a convolutional neural network inspired by the LeNet CNN and incorporates the innovative Inception module.GoogleNet also utilizes techniques such as RMSprop, image distortions, and batch normalization.(4) The GoogleNet architecture encompasses 22 layers, incorporating a total of 9 inception modules.
The initial layer serves as the input layer, accommodating a 224 × 224 × 3 RGB image.Initially, a convolutional technique is applied using a filter size of 7 × 7 and a stride of 2. This leads to a layer size of 112 × 112 × 64, with 64 representing the number of used filters.
Subsequently, the pooling technique is implemented through max pooling with a filter size of 3 × 3 and a stride of 2. This results in a layer size of 56 × 56 × 64, featuring 64 utilized filters.
Convolution is then applied with a 3 × 3 filter and a stride of 1, resulting in a layer size of 56 × 56 × 192, with 192 filters employed.
Recurring application of max pooling with a filter size of 3 × 3 and a stride of 2 yields a layer size of 28 × 28 × 192, utilizing 192 filters.
Continuing this pattern, the inception technique is repeatedly applied, resulting in a layer size of 28 × 28 × 480.
Max pooling is applied with a filter size of 3 × 3 and a stride of 2, leading to a layer size of 14 × 14 × 480, with 480 filters in use.
The inception technique is employed 5 times, yielding a next layer size of 14 × 14 × 832.
Further, max pooling is applied with a filter size of 3 × 3 and a stride of 2, resulting in a layer size of 7 × 7 × 832, with 832 filters utilized.
The inception process is repeated twice, resulting in a layer size of 7 × 7 × 1024.
Consistently, average pooling is applied with a filter size of 7 × 7 and a stride of 1.This leads to a subsequent layer size of 1 × 1 × 1024, and after applying a 40% dropout, the final output becomes 1 × 1 × 100.

ResNet:
ReNet is a neural network architecture designed for processing image data.(5) It represents a variant of recurrent neural networks (RNNs).
The core idea of ReNet involves employing recurrent neural networks in various directions across an image to capture spatial relationships and structural information.While traditional convolutional neural networks (CNNs) use fixed-size convolutional kernels to extract features from images, ReNet applies RNNs to each image block.
The operational process of ReNet can be outlined as follows: Divide the input image into distinct blocks, such as employing a grid-based partitioning.
Within each block, utilize RNNs to process sequences of pixels.RNNs can be basic LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) models.
ReNet runs RNNs in different directions (horizontal, vertical, diagonal, etc.), capturing features and relationships in various image directions.
Finally, combine the outputs of RNNs from each direction to obtain a holistic representation of image features.
ReNet's strength lies in its effective capture of spatial relationships within images, especially when dealing with temporal sequence image data like videos.However, due to its relatively higher computational complexity, it might encounter challenges when processing large-sized images.The DenseNet is a neural network architecture designed to address the vanishing gradient problem and enhance information flow in deep networks.
The core principle of DenseNet involves dense connections between layers, where each layer receives feature maps from all preceding layers, promoting feature reuse and strengthening gradient flow.This concept stands in contrast to traditional architectures where layers are connected sequentially.DenseNet is particularly effective in alleviating vanishing gradient issues and improving the flow of gradients during training, leading to enhanced convergence and information propagation.

The main characteristics of DenseNet are:
Dense Blocks: In a DenseNet, the network is organized into dense blocks, each consisting of multiple convolutional layers.Within a dense block, each layer receives the feature maps from all preceding layers, including its own feature maps.This dense connectivity fosters rich feature representations and encourages feature reuse.
Transition Layers: Between dense blocks, transition layers are introduced to control the growth of the network and reduce computational complexity.Transition layers include convolutional and pooling operations to reduce the spatial dimensions of feature maps.
Bottleneck Layers: Within each dense block, bottleneck layers may be employed to reduce the number of input feature maps before passing them on to subsequent layers.This helps manage computational costs.
The benefits of DenseNet include efficient parameter usage, as well as enhanced information flow and gradient propagation due to the dense connections.The architecture also mitigates the risk of overfitting and exhibits strong performance even with limited training data.DenseNet has shown remarkable results on various computer vision tasks, including image classification, object detection, and segmentation.

U-Net:
U-Net is a convolutional neural network that was introduced for image segmentation in the biomedical field.It was introduced by Olag Ronneberger in 2015.(7) The core principle of the U-Net revolves around a U-shaped architecture with an encoder-decoder structure, featuring skip connections to preserve spatial information and facilitate accurate segmentation.This design is particularly effective in scenarios where precise localization of objects in images is essential.
The main components of the U-Net architecture are: Encoder: The encoder portion of the U-Net consists of a series of convolutional and pooling layers that progressively reduce the spatial dimensions of the input image.This helps extract high-level features and semantic information from the input.

Decoder:
The decoder portion of the U-Net aims to upsample the feature maps to the original input resolution.It comprises a sequence of upsampling and convolutional layers.Each upsampling step expands the feature maps to restore spatial information.
Skip Connections: One of the distinctive features of U-Net is the incorporation of skip connections.These connections connect corresponding encoder and decoder layers, allowing the decoder to access high-resolution features from the encoder.Skip connections facilitate accurate localization by enabling the network to combine detailed local information from the encoder with contextual information from the decoder.

Final Layer:
The final layer typically employs a convolutional layer with a softmax activation to generate pixel-wise segmentation probabilities.Each pixel is assigned a class label indicating the presence or absence of the target object.
U-Net's architecture is well-suited for applications where pixel-level segmentation accuracy is crucial, such as medical image segmentation, cell segmentation, and similar tasks.The skip connections contribute to its ability to capture fine details while maintaining a broader context.U-Net has demonstrated impressive performance in various segmentation challenges and remains a popular choice in the field of computer vision.

V-Net:
The V-Net is a three-dimensional convolutional neural network architecture primarily designed for volumetric medical image segmentation tasks.It was introduced by Milletari.(8) The core principle of the V-Net centers on extending the concept of U-Net to 3D volumes, making it suitable for segmenting volumetric medical data such as MRI or CT scans.The architecture's design aims to accurately delineate structures within three-dimensional images, which is crucial in medical image analysis.
The main components and operation of the V-Net architecture include: Encoder-Decoder Structure: Similar to the U-Net, the V-Net follows an encoder-decoder structure.The encoder extracts high-level features through convolutional and pooling layers, while the decoder upsamples and generates segmentations.

3D Convolution and Deconvolution:
Given that medical images are volumetric, the V-Net employs 3D convolutions for feature extraction and 3D deconvolutions for upsampling.This enables the network to capture spatial relationships and structural information within the volume.

Residual Connections:
The V-Net incorporates residual connections between corresponding encoder and decoder layers.These connections aid in gradient propagation and assist in alleviating the vanishing gradient problem, contributing to more stable and efficient training.

Volumetric Segmentation:
The final layer of the V-Net comprises a 3D convolutional layer followed by a softmax activation function.This generates volumetric segmentations with class probabilities assigned to each voxel, indicating the presence or absence of structures of interest.The V-Net architecture's suitability for volumetric medical image segmentation tasks arises from its ability to process 3D volumes while preserving spatial information and contextual details.The incorporation of residual connections enhances its training efficiency and convergence.9. SegNet: SegNet, developed by researchers at the University of Cambridge, is a convolutional network designed specifically for multiclass pixel-wise segmentation tasks.(9) The architecture of SegNet comprises an encoder and a decoder.
In the encoder stage, convolutional layers are utilized along with batch normalization and ReLU activation, followed by non-overlapping max pooling to achieve downsampling.This process involves a total of 13 convolutional layers adapted from the VGG-16 model.Importantly, the locations of max pooling are stored to facilitate subsequent upsampling.
The decoder phase involves convolutional operations as well as upsampling.To accomplish this, the max pooling indices from the encoder stage are used to guide the upsampling process, effectively reconstructing the spatial dimensions.At the conclusion of the decoder, a softmax classifier is applied to each pixel, enabling the prediction of class labels.
In the visual representation provided, the relationship between upsampling and the corresponding encoder layer is illustrated.This process, referred to as "up sampling," contributes to the accurate restoration of spatial information.
Ultimately, a softmax classifier at the end of the network assigns class predictions to individual pixels.

Fast-RCNN:
Fast R-CNN is a neural network architecture designed for efficient object detection in images.
It improves upon the previous R-CNN model by introducing several innovations to streamline the detection process.(10) The key principle of Fast R-CNN is to perform object detection and classification using a single, unified network.The RPN operates on shared convolutional features and predicts objectness scores and bounding box offsets for potential regions.

RoI Pooling:
After generating region proposals, RoI (Region of Interest) pooling is applied to align the regions with fixed-size feature maps that can be fed into a neural network.This allows the extraction of consistent features for each region, regardless of its size or location.

Feature Extraction:
The aligned RoIs are passed through a series of fully connected and convolutional layers to extract relevant features.This process yields a fixed-length feature vector for each RoI.

Object Classification and Localization:
The extracted features are used for object classification and bounding box regression.Class probabilities are predicted for each RoI, along with refined bounding box coordinates.
The main advantage of Fast R-CNN is its efficiency.By sharing convolutional features across the entire image and RoIs, it significantly reduces computational redundancy compared to the sequential processing of R-CNN.This results in faster inference times without compromising detection accuracy.
Fast R-CNN's streamlined approach to object detection makes it an effective choice for real-time applications where speed and accuracy are crucial.It combines object proposal generation, feature extraction, and object classification into a single framework, leading to more efficient and accurate object detection compared to its predecessors.
These are the functions of a Residual Network.y=F (x,Wi) + x y=F (x,Wi) + Wsx These two are the equations used where x & y are input and output vectors.

6 .
DenseNet: DenseNet is utilized for object recognition and has demonstrated superior performance compared to ResNet in this domain.While the architectures of DenseNet and ResNet share similarities, DenseNet's slight modification is pivotal in achieving superior outcomes over ResNet.Notably, DenseNet employs layer concatenation, whereas ResNet employs an additive approach.It's worth mentioning that DenseNet's utilization of concatenation necessitates GPU support.(6) It combines region proposal generation, feature extraction, and object classification into a single forward pass, eliminating the need for separate steps as in R-CNN.The main components and working mechanism of the Fast R-CNN architecture are as follows: Region Proposal Network (RPN): Instead of relying on external methods like Selective Search for region proposals, Fast R-CNN employs an internal RPN that generates region proposals directly from the input image.